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1. A method for identifying nucleotide sequences encoding interacting 
polypeptide sequences, the method comprising: 

providing a host cell containing a reporter gene operably linked to a 
transcriptional regulatory sequence which includes a binding site for a DNA-binding 
domain; 

introducing into the host cell a first chimeric gene encoding a first fusion protein, 
wherein the first fusion protein comprises a first polypeptide sequence and a DNA 
binding domain, and wherein the first chimeric gene comprises a first nucleotide 
sequence encoding the first polypeptide sequence; 

introducing into the host cell a second chimeric gene encoding a second fusion 
protein, wherein the second fusion protein comprises a second polypeptide sequence and 
an activation tag, and wherein the second chimeric gene comprises a second nucleotide 
sequence encoding the second polypeptide sequence; 

culturing the host cell for a time sufficient to allow an interaction of the first 
fusion protein and the second fusion protein, wherein the interaction results in a 
measurable change in expression of the reporter gene; 

selecting the host cell based upon the measurable change in expression of the 
reporter gene; and 

sequencing the first nucleotide sequence and the second nucleotide sequence, to 
thereby identify nucleotide sequences encoding interacting polypeptide sequences. 

2. The method of claim 1 , wherein the host cell is a prokaryotic cell. 

3. The method of claim 1, wherein the host cell is a eukaryotic cell. 

4. The method of claim 1 , wherein the first nucleotide sequence is derived from a 
nucleic acid library. 

5. The method of claim 1, wherein the second nucleotide sequence is derived 
from a nucleic acid library. 
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6. The method of claim 1, wherein the first nucleotide sequence and the second 
nucleotide sequence are derived from nucleic acid libraries. 

7. The method of claim 6, wherein the nucleic acid libraries are genomic DNA 
libraries. 

8. The method of claim 7, wherein the genomic DNA libraries are whole genome 
shotgun genomic libraries. 

9. The method of claim 7, wherein the genomic DNA libraries are reduced 
representation shotgun genomic libraries. 

10. The method of claim 7, wherein the genomic DNA libraries are 
hypomethylated shotgun genomic libraries. 

1 1 . The method of claim 7, wherein the genomic DNA libraries are 
hypermethylated shotgun genomic libraries. 

12. The method of claim 6, wherein the nucleic acid libraries are cDNA libraries. 

13. The method of claim 6, wherein the nucleic acid libraries are 5' methionine- 
enriched DNA libraries. 

14. The method of claim 1, wherein the sequencing of the first nucleotide 
sequence and the second nucleotide sequence is carried out without amplifying one or 
both of the sequences after selecting the host cell based upon the measurable change in 
expression of the reporter gene. 

15. The method of claim 1, wherein the sequencing of the first nucleotide 
sequence and the second nucleotide sequence is carried out without amplifying either 



25 



Attorney Docket No. 12130-009001 



sequence after selecting the host cell based upon the measurable change in expression of 
the reporter gene. 

16. The method of claim 15, wherein, prior to sequencing, the first nucleotide 
sequence and the second nucleotide sequence are purified from the host cell in the same 
compartment of a multi-compartment device. 

17. The method of claim 15, wherein, prior to sequencing, the first nucleotide 
sequence and the second nucleotide sequence are purified from the host cell in the same 
well of a 96 well vessel. 

18. The method of claim 16, wherein sequencing reactions for the first nucleotide 
sequence and the second nucleotide sequence are carried out in the same well of a 384 
well vessel. 

19. The method of claim 15, wherein, prior to sequencing, the first nucleotide 
sequence is purified from the host cell in a first well of a first 96 well vessel and the 
second nucleotide sequence is purified from the host cell in a second well of a second 96 
well vessel, wherein the first and second wells of the first and second 96 well vessels 
occupy the same relative position in each of the 96 well vessels. 

20. The method of claim 19, wherein sequencing reactions for the first nucleotide 
sequence are carried out in a first well of a first 384 well vessel and sequencing reactions 
for the second nucleotide sequence are carried out in a second well of a second 384 well 
vessel, wherein the first and second wells of the first and second 384 well vessels occupy 
the same relative position in each of the 384 well vessels. 

21 . The method of claim 1 , further comprising preparing a computer readable 
record comprising an entry which includes a first identifier corresponding to the first 
polypeptide sequence and a second identifier corresponding to a binding property of the 
first polypeptide sequence. 
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22. The method of claim 1, wherein prior to the selecting of the cell based upon 
the measurable change in expression of the reporter gene, the host cell is placed on a 
robot compatible substrate which permits automated picking of cells exhibiting the 
measurable change in expression of the reporter gene. 

23. The method of claim 22, further comprising using an automated device to 
select the host cell exhibiting the measurable change in expression of the reporter gene. 

24. A method for identifying nucleotide sequences encoding interacting 
polypeptide sequences, the method comprising: 

providing a cell population comprising a plurality of host cells, wherein each of 
the plurality of host cells contains a reporter gene operably linked to a transcriptional 
regulatory sequence which includes a binding site for a DNA-binding domain; 

introducing into each of the plurality of host cells a first chimeric gene encoding a 
first fusion protein, wherein the first fusion protein comprises a first polypeptide 
sequence and a DNA binding domain, wherein the first chimeric gene comprises a first 
nucleotide sequence which encodes the first polypeptide sequence, and wherein the first 
nucleotide sequence is different in the first chimeric gene introduced into each of the 
plurality of host cells; 

introducing into each of the plurality of host cells a second chimeric gene 
encoding a second fusion protein, wherein the second fusion protein comprises a second 
polypeptide sequence and an activation tag, wherein the second chimeric gene comprises 
a second nucleotide sequence which encodes the second polypeptide sequence, and 
wherein the second nucleotide sequence is different in the second chimeric gene 
introduced into each of the plurality of host cells; 

culturing the plurality of host cells for a time sufficient to allow an interaction of 
the first fusion protein and the second fusion protein, wherein the interaction results in a 
measurable change in expression of the reporter gene; 

selecting the plurality of host cells based upon the measurable change in 
expression of the reporter gene; and 
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sequencing the first nucleotide sequence and the second nucleotide sequence 
contained in each of the plurality of ho^t cells, to thereby identify nucleotide sequences 
encoding interacting polypeptide sequences. 

25. The method of claim 24, wherein the plurality of host cells comprises at least 
100 cells. 

26. The method of claim 25, wherein the plurality of host cells comprises at least 
1,000 cells. 

27. The method of claim 26, wherein the plurality of host cells comprises at least 
10,000 cells. 

28. The method of claim 27, wherein the plurality of host cells comprises at least 
100,000 cells. 

29. The method of claim 28, wherein the plurality of host cells comprises at least 
1,0000,000 cells. 

30. The method of claim 29, wherein the plurality of host cells comprises at least 
10,000,000 cells. 

3 1 . The method of claim 30, wherein the plurality of host cells comprises at least 
100,000,000 cells. 

32. The method of claim 31, wherein the plurality of host cells comprises at least 
1,000,000,000 cells. 

33. The method of claim 24, wherein at least 1,000 different first chimeric genes 
are introduced into the plurality of host cells. 
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34. The method of claim 33, wherein at least 10,000 different first chimeric genes 
are introduced into the plurality of host cells. 

35. The method of claim 34, wherein at least 100,000 different first chimeric 
genes are introduced into the plurality of host cells. 

36. The method of claim 24, wherein at least 1,000 different second chimeric 
genes are introduced into the plurality of host cells. 

37. The method of claim 36, wherein at least 10,000 different second chimeric 
genes are introduced into the plurality of host cells. 

38. The method of claim 37, wherein at least 100,000 different second chimeric 
genes are introduced into the plurality of host cells. 

39. The method of claim 26, wherein the plurality of host cells are prokaryotic 

cells. 

40. The method of claim 26, wherein the plurality of host cells are eukaryotic 

cells. 

4 1 . The method of claim 26, wherein a first DNA-binding domain is encoded by 
the first chimeric gene introduced into a first subset of the plurality of host cells, and 
wherein a second DNA-binding domain is encoded by the first chimeric gene introduced 
into a second subset of the plurality of host cells. 

42. The method of claim 41, wherein a third DNA-binding domain is encoded by 
the first chimeric gene introduced into a third subset of the plurality of host cells. 

43. The method of claim 26, wherein the DNA-binding domain is fused to the 
amino terminus of the first polypeptide sequence in a first subset of the plurality of host 
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cells, and wherein the DNA-binding domain is fused to the carboxy terminus of the first 
polypeptide sequence in a second subset of the plurality of host cells. 

44. The method of claim 26, wherein a first activation tag is encoded by the 
5 second chimeric gene introduced into a first subset of the plurality of host cells, and 

wherein a second activation tag is encoded by the second chimeric gene introduced into a 
second subset of the plurality of host cells. 

45. The method of claim 44, wherein a third activation tag is encoded by the 
10 second chimeric gene introduced into a third subset of the plurality of host cells. 

46. The method of claim 39, wherein the first nucleotide sequence is derived 
from a nucleic acid library. 

47. The method of claim 39, wherein the second nucleotide sequence is derived 
from a nucleic acid library. 

48. The method of claim 39, wherein the first nucleotide sequence and the second 
nucleotide sequence are derived from nucleic acid libraries. 

49. The method of claim 48, wherein the nucleic acid libraries are genomic DNA 
libraries. 

50. The method of claim 49, wherein the genomic DNA libraries are whole 
25 genome shotgun genomic libraries. 

51. The method of claim 49, wherein the genomic DNA libraries are reduced 
representation shotgun genomic libraries. 

30 52. The method of claim 49, wherein the genomic DNA libraries are 

hypomethylated shotgun genomic libraries. 
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53. The method of claim 49, wherein the genomic DNA libraries are 
hypermethylated shotgun genomic libraries. 

54. The method of claim 48, wherein the nucleic acid libraries are cDNA 
libraries. 

55. The method of claim 48, wherein the nucleic acid libraries are 5' methionine- 
enriched DNA libraries. 

56. The method of claim 26, wherein the sequencing of the first nucleotide 
sequence and the second nucleotide sequence in each of the host cells is carried out 
without amplifying either sequence after selecting the plurality of host cells based upon 
the measurable change in expression of the reporter gene. 

57. The method of claim 56, wherein, prior to sequencing, the first nucleotide 
sequence and the second nucleotide sequence for each of the plurality of host cells are 
purified from the respective host cells in the same well of a 96 well vessel. 

58. The method of claim 57, wherein sequencing reactions for the first nucleotide 
sequence and the second nucleotide sequence for each of the plurality of host cells are 
carried out in the same well of a 384 well vessel. 

59. The method of claim 26, wherein prior to selecting of the cells based upon the 
measurable change in expression of the reporter gene, the plurality of host cells are 
placed on a robot compatible substrate which permits automated picking of cells 
exhibiting the measurable change in expression of the reporter gene. 

60. The method of claim 59, further comprising using an automated device to 
pick the plurality of host cells exhibiting the measurable change in expression of the 
reporter gene. 
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61. The method of claim 60, wherein the plurality of host cells are prokaryotic 

cells. 

62. The method of claim 61 , wherein after selecting of the cells based upon the 
measurable change in expression of the reporter gene, the first nucleotide sequence and 
the second nucleotide sequence are purified from the plurality of host cells and sequenced 
without amplification of the sequences prior to the carrying out of the sequencing step. 

63. The method of claim 62, wherein the first nucleotide sequence and the second 
nucleotide sequence of each of the plurality of host cells are purified in the same vessel. 

64. The method of claim 63, wherein the first nucleotide sequence and the second 
nucleotide sequence of the plurality of host cells are purified in wells of 96 well vessels. 

65. The method of claim 64, wherein sequencing reactions for each of the first 
nucleotide sequence and the second nucleotide sequence of the plurality of host cells are 
carried out in wells of 384 well vessels. 

66. The method of claim 64, wherein the sequencing comprises single-plex 
sequencing of the first nucleotide sequence and the second nucleotide sequence. 

67. The method of claim 64, wherein the sequencing reactions in the wells of the 
384 well vessels is carried out using a reaction volume of 25 jul or less. 

68. The method of claim 64, wherein reaction products of the sequencing 
reactions are transferred directly from the well of the 384 well vessel to a capillary, 
microfabricated, or single molecule DNA sequencer. 

69. The method of claim 24, further comprising preparing a computer readable 
record comprising a plurality of entries, each entry comprising a first identifier which 
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corresponds to a first polypeptide sequence and a second identifier which corresponds to 
a binding property of the first polypeptide sequence. 

70. A method for identifying nucleotide sequences encoding interacting 
polypeptide sequences, the method comprising: 

providing a cell population comprising 100,000 bacterial host cells, wherein each 
of the 100,000 bacterial host cells contains a reporter gene operably linked to a 
transcriptional regulatory sequence which includes a binding site for a DNA-binding 
domain; 

introducing into each of the 100,000 bacterial host cells a first chimeric gene 
encoding a first fusion protein, wherein the first fusion protein comprises a first 
polypeptide sequence and a DNA-binding domain, wherein the first chimeric gene 
comprises a first nucleotide sequence which encodes the first polypeptide sequence, and 
wherein the first nucleotide sequence is different in the first chimeric gene introduced 
into each of the 100,000 bacterial host cells; 

introducing into each of the 100,000 bacterial host cells a second chimeric gene 
encoding a second fusion protein, wherein the second fusion protein comprises a second 
polypeptide sequence and an activation tag, wherein the second chimeric gene comprises 
a second nucleotide sequence which encodes the second polypeptide sequence, and 
wherein the second nucleotide sequence is different in the second chimeric gene 
introduced into each of the 100,000 bacterial host cells; 

culturing the 100,000 bacterial host cells for a time sufficient to allow an 
interaction of the first fusion protein and the second fusion protein, if present, wherein the 
interaction results in a measurable change in expression of the reporter gene; 

selecting from the 100,000 bacterial host cells those cells that exhibit the 
measurable change in expression of the reporter gene, to thereby result in selected 
bacterial host cells; and 

sequencing the first nucleotide sequence and the second nucleotide sequence 
contained in selected bacterial host cells, to thereby identify nucleotide sequences 
encoding interacting polypeptide sequences. 
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71. A method for identifying nucleotide sequences encoding interacting 
polypeptide sequences, the method comprising: 

providing a cell population comprising a plurality of host cells, wherein each of 
the plurality of host cells contains a reporter gene operably linked to a transcriptional 
regulatory sequence which includes a binding site for a DNA-binding domain; 

introducing into each of the plurality of host cells a first chimeric gene encoding a 
first fusion protein, wherein the first fusion protein comprises a first polypeptide 
sequence and a DNA binding domain, and wherein the first chimeric gene comprises a 
first nucleotide sequence encoding the first polypeptide sequence; 

introducing into each of the plurality of host cells a second chimeric gene 
encoding a second fusion protein, wherein the second fusion protein comprises a second 
polypeptide sequence and an activation tag, wherein the second chimeric gene comprises 
a second nucleotide sequence which encodes the second polypeptide sequence, and 
wherein the second nucleotide sequence is different in the second chimeric gene 
introduced into each of the plurality of host cells; 

culturing the plurality of host cells for a time sufficient to allow an interaction of 
the first fusion protein and the second fusion protein, if present, wherein the interaction 
results in a measurable change in expression of the reporter gene; 

selecting from the plurality of host cells those cells that exhibit the measurable 
change in expression of the reporter gene, to thereby result in selected host cells; 

purifying the second nucleotide sequence from each of the selected host cells, 
wherein the purification is carried out by an automated process in compartments of a 
multi-compartment device; and 

sequencing the second nucleotide sequence from each of the selected host cells, to 
thereby identify nucleotide sequences encoding interacting polypeptide sequences. 

72. The method of claim 71, wherein the second nucleotide sequences from each 
of the selected host cells are purified in wells of a 96 well vessel. 

73. The method of claim 72, wherein sequencing reactions for the second 
nucleotide sequences are carried out in wells of a 384 well vessel 



34 



Attorney Docket No. 12130-009001 



74. A method for identifying nucleotide sequences encoding interacting 
polypeptide sequences, the method comprising: 

providing a cell population comprising a plurality of host cells, wherein each of 
the plurality of host cells contains a reporter gene operably linked to a transcriptional 
regulatory sequence which includes a binding site for a DNA-binding domain; 

introducing into each of the plurality of host cells a first chimeric gene encoding a 
first fusion protein, wherein the first fusion protein comprises a first polypeptide 
sequence and a DNA binding domain, wherein the first chimeric gene comprises a first 
nucleotide sequence which encodes the first polypeptide sequence, and wherein the first 
nucleotide sequence is different in the first chimeric gene introduced into each of the 
plurality of host cells; 

introducing into each of the plurality of host cells a second chimeric gene 
encoding a second fusion protein, wherein the second fusion protein comprises a second 
polypeptide sequence and an activation tag, and wherein the second chimeric gene 
comprises a second nucleotide sequence encoding the second polypeptide sequence; 

culturing the plurality of host cells for a time sufficient to allow an interaction of 
the first fusion protein and the second fusion protein, if present, wherein the interaction 
results in a measurable change in expression of the reporter gene; 

selecting from the plurality of host cells those cells that exhibit the measurable 
change in expression of the reporter gene, to thereby result in selected host cells; 

purifying the first nucleotide sequence from each of the selected host cells, 
wherein the purification is carried out by an automated process in compartments of a 
multi-compartment device; and 

sequencing the first nucleotide sequence from each of the selected host cells, to 
thereby identify nucleotide sequences encoding interacting polypeptide sequences. 

75. The method of claim 74, wherein the first nucleotide sequences from each of 
the selected host cells are purified in wells of a 96 well vessel. 
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76. The method of claim 75, wherein sequencing reactions for the first nucleotide 
sequences are carried out in wells of a 384 well vessel. 

77. The method of claim 1, further comprising shotgun sequencing a nucleic acid 
library or the genome of an organism or a virus, wherein the method identifies in serial or 
in parallel via a two-hybrid assay the protein-protein interactions of polypeptides encoded 
by the nucleic acid library or the genome of the organism or virus. 

78. The method of claim 77, wherein nucleic acid vectors used to carry out the 
shotgun sequencing are the same as the nucleic acid vectors used to carry out the two- 
hybrid assay. 

79. The method of claim 77, wherein the method identifies in serial the protein- 
protein interactions of polypeptides encoded by the nucleic acid library or the genome of 
the organism or virus. 

80. The method of claim 77, wherein the method identifies in parallel the protein- 
protein interactions of polypeptides encoded by the nucleic acid library or the genome of 
the organism or virus. 

81. The method of claim 1, wherein the selected host cell is grown, prior to 
sequencing of the first and second nucleotide sequences, in a well of a plate having at 
least 96 wells. 

82. The method of claim 71, wherein each of the selected host cells are grown, 
prior to sequencing of the second nucleotide sequence from each of the selected host 
cells, in wells of plates having at least 96 wells. 

83. The method of claim 74, wnerem each of the selected host cells are grown, 
prior to sequencing of the first nucleotide sequence from each of the selected host cells, 
in wells of plates having at least 96 wells. 
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